GEMM RCCL Overlap Comprehensive Performance Report

Complete Analysis: rocm-7.0.8-meta (Baseline) vs rocm-7.0.10-meta (Test)
Generated: 2025-12-08 14:59:23

Executive Summary

This comprehensive report contains all performance analysis plots and metrics for RCCL comparisons across multiple configurations.

Test Configuration

  • Baseline: rocm-7.0.8-meta
  • Test Version: rocm-7.0.10-meta
  • Configurations: 8 total (256/512 threads × 28/42/56/70 channels)
  • Total Plots: 96 visualizations

Configuration: 256 Threads, 28 Channels

Base: rocm-7.0.8-meta
Test: rocm-7.0.10-meta

Overview Plots

Percentage Change Overview
Percentage Change Overview
Absolute Time Comparison
Absolute Time Comparison
Performance Heatmap
Performance Heatmap
Total Execution Time by Rank
Total Execution Time by Rank

Detailed Metrics

Computation Time Across Ranks
Computation Time Across Ranks
Communication Time Across Ranks
Communication Time Across Ranks
Idle Time Across Ranks
Idle Time Across Ranks
Percentage Difference All Metrics
Percentage Difference All Metrics

NCCL Analysis

NCCL Latency Analysis
NCCL Latency Analysis
NCCL Summary Analysis
NCCL Summary Analysis
Back to Summary

Configuration: 256 Threads, 42 Channels

Base: rocm-7.0.8-meta
Test: rocm-7.0.10-meta

Overview Plots

Percentage Change Overview
Percentage Change Overview
Absolute Time Comparison
Absolute Time Comparison
Performance Heatmap
Performance Heatmap
Total Execution Time by Rank
Total Execution Time by Rank

Detailed Metrics

Computation Time Across Ranks
Computation Time Across Ranks
Communication Time Across Ranks
Communication Time Across Ranks
Idle Time Across Ranks
Idle Time Across Ranks
Percentage Difference All Metrics
Percentage Difference All Metrics

NCCL Analysis

NCCL Latency Analysis
NCCL Latency Analysis
NCCL Summary Analysis
NCCL Summary Analysis
Back to Summary

Configuration: 256 Threads, 56 Channels

Base: rocm-7.0.8-meta
Test: rocm-7.0.10-meta

Overview Plots

Percentage Change Overview
Percentage Change Overview
Absolute Time Comparison
Absolute Time Comparison
Performance Heatmap
Performance Heatmap
Total Execution Time by Rank
Total Execution Time by Rank

Detailed Metrics

Computation Time Across Ranks
Computation Time Across Ranks
Communication Time Across Ranks
Communication Time Across Ranks
Idle Time Across Ranks
Idle Time Across Ranks
Percentage Difference All Metrics
Percentage Difference All Metrics

NCCL Analysis

NCCL Latency Analysis
NCCL Latency Analysis
NCCL Summary Analysis
NCCL Summary Analysis
Back to Summary

Configuration: 256 Threads, 70 Channels

Base: rocm-7.0.8-meta
Test: rocm-7.0.10-meta

Overview Plots

Percentage Change Overview
Percentage Change Overview
Absolute Time Comparison
Absolute Time Comparison
Performance Heatmap
Performance Heatmap
Total Execution Time by Rank
Total Execution Time by Rank

Detailed Metrics

Computation Time Across Ranks
Computation Time Across Ranks
Communication Time Across Ranks
Communication Time Across Ranks
Idle Time Across Ranks
Idle Time Across Ranks
Percentage Difference All Metrics
Percentage Difference All Metrics

NCCL Analysis

NCCL Latency Analysis
NCCL Latency Analysis
NCCL Summary Analysis
NCCL Summary Analysis
Back to Summary

Configuration: 512 Threads, 28 Channels

Base: rocm-7.0.8-meta
Test: rocm-7.0.10-meta

Overview Plots

Percentage Change Overview
Percentage Change Overview
Absolute Time Comparison
Absolute Time Comparison
Performance Heatmap
Performance Heatmap
Total Execution Time by Rank
Total Execution Time by Rank

Detailed Metrics

Computation Time Across Ranks
Computation Time Across Ranks
Communication Time Across Ranks
Communication Time Across Ranks
Idle Time Across Ranks
Idle Time Across Ranks
Percentage Difference All Metrics
Percentage Difference All Metrics

NCCL Analysis

NCCL Latency Analysis
NCCL Latency Analysis
NCCL Summary Analysis
NCCL Summary Analysis
Back to Summary

Configuration: 512 Threads, 42 Channels

Base: rocm-7.0.8-meta
Test: rocm-7.0.10-meta

Overview Plots

Percentage Change Overview
Percentage Change Overview
Absolute Time Comparison
Absolute Time Comparison
Performance Heatmap
Performance Heatmap
Total Execution Time by Rank
Total Execution Time by Rank

Detailed Metrics

Computation Time Across Ranks
Computation Time Across Ranks
Communication Time Across Ranks
Communication Time Across Ranks
Idle Time Across Ranks
Idle Time Across Ranks
Percentage Difference All Metrics
Percentage Difference All Metrics

NCCL Analysis

NCCL Latency Analysis
NCCL Latency Analysis
NCCL Summary Analysis
NCCL Summary Analysis
Back to Summary

Configuration: 512 Threads, 56 Channels

Base: rocm-7.0.8-meta
Test: rocm-7.0.10-meta

Overview Plots

Percentage Change Overview
Percentage Change Overview
Absolute Time Comparison
Absolute Time Comparison
Performance Heatmap
Performance Heatmap
Total Execution Time by Rank
Total Execution Time by Rank

Detailed Metrics

Computation Time Across Ranks
Computation Time Across Ranks
Communication Time Across Ranks
Communication Time Across Ranks
Idle Time Across Ranks
Idle Time Across Ranks
Percentage Difference All Metrics
Percentage Difference All Metrics

NCCL Analysis

NCCL Latency Analysis
NCCL Latency Analysis
NCCL Summary Analysis
NCCL Summary Analysis
Back to Summary

Configuration: 512 Threads, 70 Channels

Base: rocm-7.0.8-meta
Test: rocm-7.0.10-meta

Overview Plots

Percentage Change Overview
Percentage Change Overview
Absolute Time Comparison
Absolute Time Comparison
Performance Heatmap
Performance Heatmap
Total Execution Time by Rank
Total Execution Time by Rank

Detailed Metrics

Computation Time Across Ranks
Computation Time Across Ranks
Communication Time Across Ranks
Communication Time Across Ranks
Idle Time Across Ranks
Idle Time Across Ranks
Percentage Difference All Metrics
Percentage Difference All Metrics

NCCL Analysis

NCCL Latency Analysis
NCCL Latency Analysis
NCCL Summary Analysis
NCCL Summary Analysis
Back to Summary